Time-Decaying Bandits for Non-stationary Systems
نویسندگان
چکیده
Contents displayed on web portals (e.g., news articles at Yahoo.com) are usually adaptively selected from a dynamic set of candidate items, and the attractiveness of each item decays over time. The goal of those websites is to maximize the engagement of users (usually measured by their clicks) on the selected items. We formulate this kind of applications as a new variant of bandit problems where new arms are dynamically added into the candidate set and the expected reward of each arm decays as the round proceeds. For this new problem, a direct application of the algorithms designed for stochastic MAB (e.g., UCB) will lead to over-estimation of the rewards of old arms, and thus cause a misidentification of the optimal arm. To tackle this challenge, we propose a new algorithm that can adaptively estimate the temporal dynamics in the rewards of the arms, and effectively identify the best arm at a given time point on this basis. When the temporal dynamics are represented by a set of features, the proposed algorithm is able to enjoy a sub-linear regret. Our experiments verify the effectiveness of the proposed algorithm.
منابع مشابه
Improving Online Marketing Experiments with Drifting Multi-armed Bandits
Restless bandits model the exploration vs. exploitation trade-off in a changing (non-stationary) world. Restless bandits have been studied in both the context of continuously-changing (drifting) and change-point (sudden) restlessness. In this work, we study specific classes of drifting restless bandits selected for their relevance to modelling an online website optimization process. The contrib...
متن کاملA new adaptive exponential smoothing method for non-stationary time series with level shifts
Simple exponential smoothing (SES) methods are the most commonly used methods in forecasting and time series analysis. However, they are generally insensitive to non-stationary structural events such as level shifts, ramp shifts, and spikes or impulses. Similar to that of outliers in stationary time series, these non-stationary events will lead to increased level of errors in the forecasting pr...
متن کاملAn Asymptotically Optimal Heuristic for General Non- Stationary Finite-Horizon Restless Multi-Armed Multi- Action Bandits
متن کامل
A scalarization-based method for multiple part-type scheduling of two-machine robotic systems with non-destructive testing technologies
This paper analyzes the performance of a robotic system with two machines in which machines are configured in a circular layout and produce non-identical parts repetitively. The non-destructive testing (NDT) is performed by a stationary robotic arm located in the center of the circle, or a cluster tool. The robotic arm integrates multiple tasks, mainly the NDT of the part and its transition bet...
متن کاملRotting Bandits
The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker’s objective is to maximize her cumulative expected reward over the time horizon. The MAB problem ha...
متن کامل